Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sorted slices to represent HDR histogram #52

Merged
merged 11 commits into from
Jul 31, 2023

Conversation

lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Jul 28, 2023

An alternative to #49. Uses sorted slices to store HDRHistograms. The sorted nature is used to efficiently perform merges.

Benchmark (thanks to #50):

name                           old time/op    new time/op    delta
NDJSON/AgentNodeJS-10            33.9ms ± 1%    15.3ms ± 3%  -54.73%  (p=0.000 n=10+10)
NDJSON/AgentPython-10            58.7ms ± 1%    44.1ms ± 2%  -24.76%  (p=0.000 n=9+10)
NDJSON/AgentRuby-10              49.7ms ± 1%    23.7ms ± 2%  -52.36%  (p=0.000 n=9+10)
NDJSON/AgentGo-10                76.2ms ± 1%    26.6ms ± 1%  -65.13%  (p=0.000 n=10+10)
NDJSONParallel/AgentNodeJS-10    33.9ms ± 1%    15.5ms ± 2%  -54.33%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10    59.5ms ± 3%    44.1ms ± 2%  -25.82%  (p=0.000 n=10+10)
NDJSONParallel/AgentRuby-10      50.2ms ± 2%    23.4ms ± 3%  -53.42%  (p=0.000 n=10+10)
NDJSONParallel/AgentGo-10        77.0ms ± 2%    26.8ms ± 1%  -65.25%  (p=0.000 n=10+10)

name                           old alloc/op   new alloc/op   delta
NDJSON/AgentNodeJS-10            7.74MB ± 0%    7.58MB ± 0%   -2.04%  (p=0.000 n=10+9)
NDJSON/AgentPython-10            21.4MB ± 1%    20.9MB ± 1%   -2.41%  (p=0.000 n=10+10)
NDJSON/AgentRuby-10              12.0MB ± 0%    11.7MB ± 0%   -1.95%  (p=0.000 n=10+10)
NDJSON/AgentGo-10                15.2MB ± 1%    14.0MB ± 0%   -8.18%  (p=0.000 n=10+10)
NDJSONParallel/AgentNodeJS-10    7.73MB ± 0%    7.59MB ± 0%   -1.76%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10    21.4MB ± 1%    20.9MB ± 0%   -2.54%  (p=0.000 n=10+10)
NDJSONParallel/AgentRuby-10      12.0MB ± 0%    11.7MB ± 0%   -1.96%  (p=0.000 n=9+10)
NDJSONParallel/AgentGo-10        15.2MB ± 1%    14.0MB ± 0%   -8.21%  (p=0.000 n=9+9)

name                           old allocs/op  new allocs/op  delta
NDJSON/AgentNodeJS-10             83.4k ± 0%     83.1k ± 0%   -0.36%  (p=0.000 n=10+10)
NDJSON/AgentPython-10              221k ± 1%      223k ± 1%   +0.87%  (p=0.009 n=10+10)
NDJSON/AgentRuby-10                134k ± 0%      134k ± 0%   +0.30%  (p=0.000 n=10+9)
NDJSON/AgentGo-10                  165k ± 0%      162k ± 0%   -1.86%  (p=0.000 n=10+8)
NDJSONParallel/AgentNodeJS-10     83.4k ± 0%     83.1k ± 0%   -0.36%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10      222k ± 1%      222k ± 1%     ~     (p=0.165 n=10+10)
NDJSONParallel/AgentRuby-10        134k ± 0%      134k ± 0%   +0.29%  (p=0.000 n=10+7)

More benchmarks:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    1.64µs ± 2%    1.59µs ± 3%   -2.98%  (p=0.000 n=10+9)
AggregateBatchSerial-10        5.44µs ± 5%    4.76µs ± 2%  -12.48%  (p=0.000 n=10+10)
AggregateBatchParallel-10      5.49µs ± 2%    4.83µs ± 3%  -12.04%  (p=0.000 n=8+10)
CombinedMetricsEncoding-10      982ns ± 2%     960ns ± 0%   -2.27%  (p=0.000 n=9+7)
CombinedMetricsToBatch-10       201µs ± 1%      89µs ± 0%  -55.69%  (p=0.000 n=8+10)
EventToCombinedMetrics-10       312ns ± 1%     310ns ± 0%   -0.73%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    1.49kB ± 0%    1.45kB ± 1%   -2.93%  (p=0.000 n=9+10)
AggregateBatchSerial-10        2.66kB ± 4%    2.54kB ± 4%   -4.56%  (p=0.001 n=10+10)
AggregateBatchParallel-10      2.63kB ± 2%    2.55kB ± 4%   -2.89%  (p=0.003 n=9+10)
CombinedMetricsEncoding-10       654B ± 1%      640B ± 1%   -2.09%  (p=0.000 n=9+9)
CombinedMetricsToBatch-10      4.09kB ± 0%    4.08kB ± 0%   -0.27%  (p=0.000 n=9+10)
EventToCombinedMetrics-10       0.00B          0.00B          ~     (all equal)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      23.0 ± 0%      21.0 ± 0%   -8.70%  (p=0.000 n=10+10)
AggregateBatchSerial-10          31.6 ± 5%      26.1 ± 4%  -17.29%  (p=0.000 n=9+10)
AggregateBatchParallel-10        31.4 ± 2%      26.4 ± 5%  -15.92%  (p=0.000 n=10+10)
CombinedMetricsEncoding-10       0.00           0.00          ~     (all equal)
CombinedMetricsToBatch-10        95.0 ± 0%      95.0 ± 0%     ~     (all equal)
EventToCombinedMetrics-10        0.00           0.00          ~     (all equal)

@lahsivjar lahsivjar marked this pull request as ready for review July 29, 2023 03:25
@lahsivjar lahsivjar requested a review from a team as a code owner July 29, 2023 03:25
aggregators/codec.go Outdated Show resolved Hide resolved
proto/aggregation.proto Outdated Show resolved Hide resolved
@@ -152,3 +139,16 @@ message Overflow {
bytes overflow_service_transactions_estimator = 5;
bytes overflow_spans_estimator = 6;
}

message Bar {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HDRHistogramBar?

IIANM adding this message will increase the amount of pointer indirection, and memory pressure, compared to #49.

OTOH I like the approach of keeping the histogram sorted at all times. Did you already consider and investigate doing this with the existing pair of arrays representation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you already consider and investigate doing this with the existing pair of arrays representation?

Good point, I didn't do it. I will use counts and buckets instead of Bar for the proto representation.

OTOH I like the approach of keeping the histogram sorted at all times

Another advantage of this is that it makes the Buckets() calculation more efficient (which is why we see the improvement in CombinedMetricsToBatch w.r.t. time/op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to remove Bar message from proto definition.

@lahsivjar lahsivjar requested a review from axw July 31, 2023 03:39
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just wondering if we can further simplify the HistogramRepresentation code.

aggregators/internal/hdrhistogram/hdrhistogram.go Outdated Show resolved Hide resolved
aggregators/internal/hdrhistogram/hdrhistogram.go Outdated Show resolved Hide resolved
for idx := 0; iter.next(); idx++ {
if bucketsSeen == h.CountsRep.Len() {
break
iter.nextCountAtIdx()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the iterator type is still needed, and whether we could simplify things. Would it be possible to store the bucket value, rather than index, in CountsRep? Then Buckets is simply iterating through and copying to slices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to store the bucket value, rather than index, in CountsRep?

Do you mean save the highestEquivalentValue for each bucket instead of index? That value is currently calculated based on the bucket and sub bucket index and I think it would require some re-implementation of the core logic.

Copy link
Contributor Author

@lahsivjar lahsivjar Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to use the bucketIdx to derive the highestEquivalentValue directly without requiring the subBucket (which is the whole reason I kept the iterator). Giving this a shot now.

This doesn't work because we require the subBucketIdx even in this case. I think we might be able to do some refactoring to achieve a simpler state but maybe in a future PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 let's take it to a followup

aggregators/merger.go Outdated Show resolved Hide resolved
lahsivjar and others added 3 commits July 31, 2023 13:41
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
@lahsivjar lahsivjar merged commit 4005592 into elastic:main Jul 31, 2023
2 checks passed
@lahsivjar lahsivjar deleted the sorted-slice-hdr branch July 31, 2023 07:24
Comment on lines +441 to +442
to.Buckets = slices.Grow(to.Buckets, requiredLen)
to.Counts = slices.Grow(to.Counts, requiredLen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's a slight issue with slices.Grow here, since slices.Grow(s, n) ensures there's extra space for n more elements. The right call here should be to.Buckets = slices.Grow(to.Buckets, requiredLen-len(to.Buckets)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants